140 research outputs found
Computational disclosure control : a primer on data privacy protection
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (leaves 213-216) and index.Today's globally networked society places great demand on the dissemination and sharing of person specific data for many new and exciting uses. When these data are linked together, they provide an electronic shadow of a person or organization that is as identifying and personal as a fingerprint even when the information contains no explicit identifiers, such as name and phone number. Other distinctive data, such as birth date and ZIP code, often combine uniquely and can be linked to publicly available information to re-identify individuals. Producing anonymous data that remains specific enough to be useful is often a very difficult task and practice today tends to either incorrectly believe confidentiality is maintained when it is not or produces data that are practically useless. The goal of the work presented in this book is to explore computational techniques for releasing useful information in such a way that the identity of any individual or entity contained in data cannot be recognized while the data remain practically useful. I begin by demonstrating ways to learn information about entities from publicly available information. I then provide a formal framework for reasoning about disclosure control and the ability to infer the identities of entities contained within the data. I formally define and present null-map, k-map and wrong-map as models of protection. Each model provides protection by ensuring that released information maps to no, k or incorrect entities, respectively. The book ends by examining four computational systems that attempt to maintain privacy while releasing electronic information. These systems are: (1) my Scrub System, which locates personally-identifying information in letters between doctors and notes written by clinicians; (2) my Datafly II System, which generalizes and suppresses values in field-structured data sets; (3) Statistics Netherlands' pt-Argus System, which is becoming a European standard for producing public-use data; and, (4) my k-Similar algorithm, which finds optimal solutions such that data are minimally distorted while still providing adequate protection. By introducing anonymity and quality metrics, I show that Datafly II can overprotect data, Scrub and p-Argus can fail to provide adequate protection, but k-similar finds optimal results.by Latanya Sweeney.Ph.D
Sprees, a finite-state orthographic learning system that recognizes and generates phonologically similar spellings
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (leaves 31-32).by Latanya Sweeney.M.S
Location Privacy in Spatial Crowdsourcing
Spatial crowdsourcing (SC) is a new platform that engages individuals in
collecting and analyzing environmental, social and other spatiotemporal
information. With SC, requesters outsource their spatiotemporal tasks to a set
of workers, who will perform the tasks by physically traveling to the tasks'
locations. This chapter identifies privacy threats toward both workers and
requesters during the two main phases of spatial crowdsourcing, tasking and
reporting. Tasking is the process of identifying which tasks should be assigned
to which workers. This process is handled by a spatial crowdsourcing server
(SC-server). The latter phase is reporting, in which workers travel to the
tasks' locations, complete the tasks and upload their reports to the SC-server.
The challenge is to enable effective and efficient tasking as well as reporting
in SC without disclosing the actual locations of workers (at least until they
agree to perform a task) and the tasks themselves (at least to workers who are
not assigned to those tasks). This chapter aims to provide an overview of the
state-of-the-art in protecting users' location privacy in spatial
crowdsourcing. We provide a comparative study of a diverse set of solutions in
terms of task publishing modes (push vs. pull), problem focuses (tasking and
reporting), threats (server, requester and worker), and underlying technical
approaches (from pseudonymity, cloaking, and perturbation to exchange-based and
encryption-based techniques). The strengths and drawbacks of the techniques are
highlighted, leading to a discussion of open problems and future work
- …